Analysis of Automatic Stress Assignment in Slovene

نویسندگان

  • Domen Marincic
  • Tea Tusar
  • Matjaz Gams
  • Tomaz Sef
چکیده

We tested the ability of humans and machines (data mining techniques) to assign stress to Slovene words. This is a challenging comparison for machines since humans accomplish the task outstandingly even on unknown words without any context. The goal of finding good machine-made models for stress assignment was set by applying new methods and by making use of a known theory about rules for stress assignment in Slovene. The upgraded data mining methods outperformed expert-defined rules on practically all subtasks, thus showing that data mining can more than compete with humans when constructing formal knowledge about stress assignment is concerned. Unfortunately, compared to humans directly, the data mining methods still failed to achieve as good results as humans on assigning stress to unknown words.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Accentuation of Words for Slovenian TTS System

The accentuation of unknown Slovene words represents a challenging task for automated solvers since in Slovenian, stress can be located on arbitrary syllables. Most words have only one stressed syllable, but there exist also words with no stress and words with more than one stress. Furthermore, different forms of the same word can be stressed differently. In this paper, we present a two level l...

متن کامل

Automatic Lexical Stress Assignment of Unknown Words for Highly Inflected Slovenian Language

This paper presents a two level lexical stress assignment model for out of vocabulary Slovenian words used in our text-to-speech system. First, each vowel (and consonant 'r') is determined, whether it is stressed or unstressed, and a type of lexical stress is assigned for every stressed vowel (and consonant 'r'). We applied a machine-learning technique (decision trees or boosted decision trees)...

متن کامل

Machine Learning of Morphosyntactic Structure: Lemmatizing Unknown Slovene Words

Automatic lemmatization is a core application for many language processing tasks. In inflectionally rich languages, such as Slovene, assigning the correct lemma (base form) to each word in a running text is not trivial, since for instance, nouns inflect for number and case, with a complex configuration of endings and stem modifications. The problem is especially difficult for unknown words, sin...

متن کامل

Learning to Lemmatise Slovene Words

Automatic lemmatisation is a core application for many language processing tasks. In inflectionally rich languages, such as Slovene, assigning the correct lemma to each word in a running text is not trivial: nouns and adjectives, for instance, inflect for number and case, with a complex configuration of endings and stem modifications. The problem is especially difficult for unknown words, as wo...

متن کامل

Effect of Cognitive Behavioral Thearpy Based Psychoeducation Program on Unıversity Students\' Automatic Thoughts, Perceived Stress and Self-Efficacy Levels

Background: University life is a special period in which students take full responsibility for their own lives, especially as individuals, and therefore includes many positive and negative situations. As a result of this situation, they need serious psychological support in order to cope with the potential or real problems they experience. The research was conducted to determine the effect of C...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Informatica, Lith. Acad. Sci.

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2009